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A DATA ANALYSIS OF SUCCESS IN OCS, 
THE USE OF ASVAB WAIVERS, 

AND RACE 
R. R. Read 
L. R. Whitaker 



Abstract 

Success in Officers Candidate School (OCS) occurs at the same rate 
regardless of whether the candidates received a mental aptitude 
qualification waiver based upon their score on the electronics portion of 
the Armed Services Vocational Aptitude Battery (ASVAB). However, 
these rates do change with race and time; and the result is an apparent 
contradiction because the macro rates (those rates computed overall 
without discriminating race and time) exhibit different success rates 
depending upon the presence of a waiver or not. The data are studied to 
expose the contradiction and develop sharper models. 

I. Introduction 

The accession of officers into the Marine Corps includes using one of three 
mental aptitude test scores: Armed Services Vocational Aptitude Battery 
Electronics Repair Composite (called ASVAB herein), the Scholastic Aptitude 
Test (SAT), and the American College Test (ACT). Historically, 55% of the 
officers entering the Corps use the first of these three, and the qualification 
threshold is a score of 120. But a candidate can receive a waiver of this minimum 
provided his score is 115 or better. The paper treats only those using the ASVAB 
test. 

Based on data collected over the fiscal years 1988 through 1992 and broken 
out by race, personnel at the Manpower Analysis (MA) Branch at Marine Corps 
Headquarters noticed that success at the Officer Candidate School (OCS) appears 



1 



to be independent of whether an officer has received an ASVAB waiver. 
Specifically, there are four racial groups, Caucasian, Black, Hispanic, and Other. 
The Other group consists of American Indian, Alaskan Native, Asian, and Pacific 
Islander in the large. When collapsed over time, the four 2x2 contingency table 
tests for independence yield the chi square test statistics .6678, 2.841, .7983, .5767 
for the respective races, each with one degree of freedom. None of these are 
significant. However, when the data are further collapsed over race and a single 
test for independence is performed, then the relationship is highly significant. 
This latter 2x2 table appears in Table 1. The chi square statistic is 11.87 and the 
p-value is 0.00057. 

On the surface, it appears that we have contradictory results. On the one 
hand, OCS candidate success and the presence of a waiver are independent when 
Caucasians, Blacks, Hispanics and Others are considered separately. On the other 
hand, there is dependence in the collapsed table when race is not accounted for, 
with strong evidence that the chance of success without a waiver is greater than 
that with a waiver. 



Table 1. Macro Analysis of Success and Waiver 





Waiver 


No Waiver 


Total 


Success 


754 


7449 


8203 


Failure 


299 


2303 


2602 


Total 


1053 


9752 


10805 



A short answer to the contradiction can be obtained through an interpretation of 
the two success rates. They are not significantly different for waiver and non- 
waiver within racial groups. But the rates change sharply from group to group. 
Indeed, the use of the waiver varies markedly from group to group and, to a 
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lesser extent, from year to year. This is surely related to the implementation of 
the Marine Corps Affirmative Action Plan. 

This paper contains an explanation of the contradiction and attention is 
drawn to other interesting facets as well. In Section II the raw data are presented 
and all 2 x 2 tables of success/failure by waiver /non- waiver are studied for each 
year/racial group pair. Generally, independence is tenable. To explain the non- 
independence, the full data, aggregated over years and with race as a factor, are 
then subjected to a log-linear analysis in Section HI. In Section IV, we fit models 
with time as a factor including the use of the waiver by year and race. These 
models could be valuable because an ill-advised long-term overuse of the waiver 
could lead to inequities in the future advancement to higher rank [3]. 

Categorical data is prevalent in military OR. Thus, we take a careful look at 
the data and provide details that would normally be omitted so that certain 
usage may be illustrated. In particular, in the next section, attention is drawn to 
the rather interesting effects when conditional tests are used, and in Section III 
the steps for fitting a loglinear model are presented. 

The factors of interest are success or failure of OCS candidates to qualify for 
the OCS program, whether the candidate used an ASVAB (lower mental 
category) waiver, fiscal year, and race. The data (see Table 2) consists of counts 

D ijkl 

where i = 1,2 indicates success or failure, j = 1,2 indicates presence or absence of 
waivers, k = 1, ..., 5 indicates the fiscal year FY88 to FY92 and / = 1, ..., 4 indicates 
race, in the order given earlier. 
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Table 2. Frequency Counts by Category 



Candidates Qualifying with ASVAB Waiver 





FY 


White 


Black 


Hispanic 


Other 


Total 




FY88 


100 


11 


10 


12 


133 


Success 


FY89 


142 


37 


12 


20 


211 


in 


FY90 


102 


30 


20 


11 


163 


OCS 


FY91 


77 


22 


14 


2 


115 




FY92 


70 


36 


22 


4 


132 




Total 


491 


136 


78 


49 


754 





FY 


White 


Black 


Hispanic 


Other 


Total 




FY88 


22 


8 


5 


1 


36 


Failure 


FY89 


30 


15 


11 


7 


63 


in 


FY90 


35 


16 


10 


3 


64 


OCS 


FY91 


21 


22 


6 


3 


52 




FY92 


45 


31 


8 


0 


84 




Total 


153 


92 


40 


14 


299 



Candidates Qualifying without ASVAB Waiver 





FY 


White 


Black 


Hispanic 


Other 


Total 




FY88 


1113 


48 


48 


95 


1304 


Success 


FY89 


1533 


56 


80 


111 


1780 


in 


FY90 


1263 


77 


76 


109 


1525 


OCS 


FY91 


1013 


58 


78 


39 


1188 




FY92 


1390 


87 


108 


67 


1652 




Total 


6312 


326 


390 


421 


7449 





FY 


White 


Black 


Hispanic 


Other 


Total 




FY88 


234 


14 


16 


31 


295 


Failure 


FY89 


323 


18 


22 


35 


398 


in 


FY90 


350 


50 


41 


38 


479 


OCS 


FY91 


430 


35 


38 


24 


527 




FY92 


481 


50 


48 


25 


604 




Total 


1818 


167 


165 


153 


2303 
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II. Individual Contingency Tables 

Suppose the full data are broken into twenty (5 years, 4 races) 2x2 
contingency tables and subjected to individual analyses. It is instructive to apply 
the most often used procedures to each and gain experience in their use and 
effect. 

Let us simplify the notation and let = Djjki be the counts with year and race 
held fixed, i = 1, 2 indicates success or failure in OCS, and j = 1,2 indicates 
presence or absence of v/aiver, respectively. Under independence the expected 

frequencies are estimated by 

m,y = n i+ n+j / N with N = £ £ «,y , 

and the plus indicates summation over the replaced subscript. The familiar 
Pearson Chi Square and Log Likelihood statistics are given by 

X 2 = 'Z'Z(nij-m i jf /"Hi 

i= 1/=1 

2 2 

G 2 = 2'Z'Zn i j\n(n ij / mij) 

i= 1;=1 

Each is asymptotically distributed as chi square with one degree of freedom. 

The use of the odds ratio is also popular especially in 2 x 2 tables. It 
summarizes the strength and type of dependence between the two categories. 
Letting {rify} be the cell probabilities, the odds ratio is defined by 

8 = n n n22 / 1^12^121 

and, in our context, represents the odds of OCS success using waivers divided by 
the odds of success without the use of waivers. The null value 0=1 represents 
"no effect" of waivers, or independence. The maximum likelihood estimator of 6 
is 

^ = ”11^22 / ”12 W 21- 
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The null distribution of ln(#j is well approximated by the normal distribution [1] 

a 2 22 

with the variance estimated by min 6 = 11 !/«*• 



Thus, a third test statistic is 

z=K»)/[E£i/"i,f 2 - 



Concern for the use of asymptotics has led the authors to consider Fisher's 
Exact Test as well, [1, p60ff]. Under the null hypothesis of independence, an exact 
distribution that is free of any unknown parameters results from conditioning on 
the totals in both margins. The result is a hypergeometric distribution 

'** - v n +2 ^ l(N \ 



«+ 1 
v”iiy 






"i+. 



Since the totals in the margins are given, only n\\ need be considered as variable. 
Its range is 



max(0,H + i + ri] + - N) < n u < min(n +1 ,n 1+ ). 

Exact two-sided p-values are obtained by summing probabilities of tables that 
are at least as rare under the null hypothesis as the observed table. Only those 
tables that have hypergeometric probabilities at least as small as the observed 
configuration are used [2]. 

The results of the four procedures are given in Table 3, which contains the 
values of total populations, N; the odds ratios, 8; ln(fl); the standard deviation of 
ln(#j; and the four p-values. Within cells the racial levels are Caucasian, Black, 
Hispanic, Other, respectively. There are some blank entries for the last case 
because «2i = 0. 

Perhaps the first thing to notice is the agreement of p-values for the three 
asymptotic procedures. Only for the smaller values of N do they show much 
separation. On the other hand, the p-values for Fisher's Exact Test generally tend 
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to be higher. The main reason for this is the conditioning on both margin totals. 
Such is not the case in the other procedures. In the former case, the nuisance 
parameters are eliminated while in the latter three procedures they are estimated. 

The differences in p-values do not lead to conflicting conclusions, however. 
Two cases of the twenty are significant: Hispanics '89 and Caucasians '92. In both 
of these cases the odds for success are smaller if waivers are used. The opposite is 
true for Caucasians '91, a case that might be controversial as p ~ .08. 

Table 3. Two-Sided p-values 







N 


0 


In e 


d(ln 


Fisher 


Z 


X2 


G2 


FY88 


Cauc. 


1469 


.956 


-.045 


.246 


.804 


.854 


.854 


.854 




Black 


81 


.401 


-.914 


.555 


.139 


.100 


.094 


.104 




Hisp. 


79 


.667 


-.405 


.619 


.527 


.513 


.511 


.518 




Other 


139 


3.916 


1.365 


1.061 


.298 


.198 


.168 


.126 


FY89 


Cauc. 


2028 


.997 


-.003 


.210 


1.000 


.990 


.990 


.990 




Black 


126 


.793 


-.232 


.409 


.681 


.570 


.570 


.571 




Hisp. 


125 


.300 


-1.204 


.482 


.017 


.012 


.010 


.014 




Other 


173 


.901 


-.104 


.480 


.810 


.828 


.828 


.829 


FY90 


Cauc. 


1750 


.808 


-.213 


.205 


.285 


.297 


.296 


.304 




Black 


173 


1.218 


.197 


.359 


.723 


.583 


.583 


.582 




Hisp. 


147 


1.079 


.076 


.433 


1.000 


.861 


.861 


.860 




Other 


161 


1.278 


.245 


.678 


1.000 


.717 


.717 


.712 


FY91 


Cauc. 


1541 


1.556 


.442 


.253 


.085 


.080 


.078 


.070 




Black 


137 


.603 


-.506 


.370 


.196 


.172 


.170 


.172 




Hisp. 


136 


1.137 


.128 


.527 


1.000 


.808 


.808 


.807 




Other 


68 


.410 


-.892 


.949 


.379 


.348 


.335 


.342 


FY92 


Cauc. 


1986 


.538 


-.620 


.198 


.002 


.002 


.002 


.002 




Black 


204 


.667 


-.405 


.303 


.223 


.181 


.180 


.182 




Hisp. 


186 


1.222 


.200 


.448 


.828 


.654 


.654 


.651 




Other 


96 








.570 




.225 


.116 
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III. General Models 



The four factors; success/failure, waiver/no waiver, year (1, 5), and race 
(1, 4); are denoted as A, B, C, D, respectively. Since the total number of OCS 
candidates is not fixed, the data D t yy will be assumed to be generated from an 
independent Poisson sampling scheme, i.e., D\jki are independent Poisson 
random variables with respective parameters ( mijki ) where mijki = E[Dijki1 ■ To 
interpret the results given in the introduction we first fit a loglinear model to the 
counts collapsed over years, i.e., to 



where the A's are the effects and interaction terms corresponding to the variables 
A, B, D. Using standard notation [1], this saturated model can be represented as 
[ABD], i.e., the third order interaction term ABD and all lower order terms made 
up of subsets of the variables A, B, and D are included in the model. We begin by 
fitting the model with all two-way interaction terms along with all main effects, 
i.e., the model [AB] [AD] [BD]. This gives a likelihood ratio test statistic of 2.55 
with 3 degrees of freedom and a p-value of .466. This model does fit the data. To 
see whether a more parsimonious model can be fit we remove two-way 
interaction terms one at a time. This yields the model [AD] [BD]. The overall 
likelihood ratio test statistic is 4.84 with 4 degrees of freedom giving an 
acceptable p-value of .31. To see whether anything has been lost by removing the 
AB interaction term, we test the null hypothesis [AD] [BD] versus the alternative 
[AB] [AD] [BD]. The test statistic 1.99 with 1 degree of freedom has a p-value of 




The saturated loglinear model parameterizes rriy+i = E as 





i = l,2 / = 1,2 1 = 1,.. .,4, 



8 



.256. There is not enough evidence to indicate that the AB term should be 
included. Further, deleting terms from the [AD] [BD] model yields models with 
unacceptable fits, i.e., those with likelihood ratio test statistics having p-values 
less than .05. Finally, the standardized residuals for the [AD] [BD] model range 
from -.843 to 1.090. Thus, the model [AD] [BD] is selected and fits the data 
(collapsed over years) reasonably well. 

The question now becomes, can this model account for the results that 
motivated the study. The probabilistic interpretation of the model [AD] [BD] is 
that conditional on the levels of factor D (race), the variables A and B are inde- 
pendent. To see this note that the joint probability mass function (pmf) of the 
variables A, B, C, D is 




Wjjkl 

"4+++ 



for i = 1, 2; j = 1, 2; k = 1, ..., 5; and l = 1, ..., 4. The model [AB] [BD] fitted to the 
data collapsed over years corresponds to 

In niij + i =/j. + Xf + A^ + AP + A$^+Aj/^. (2.1) 



Thus the conditional pmf of A given that B is at level j and D is at level / can be 
found from this model to be 



p. -3i±L 



expjp + Xf + AP + A#°j 
£ex p{p + Af +aP + A# d | 

i (2.2) 

Since the right hand side of (2.2) is not a function of ;, we see that the conditional 
pmf of A given B, D is the same as the conditional pmf of A given D. Thus given 
D, the factors A and B are independent. 
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However, A and B are not independent by themselves alone. The marginal 
probabilities of these two factors can be developed from the model (2.1) by 
summing 

exp{^ + Iexp{Af + X? + X%> + xf) 

t i 

and 

exp{„ + lexp}^ + X? + X{f + A?, D } 
l i 

and forming the appropriate normalizations. The joint probability is not the 
product of these probabilities. Thus the model supports the observation made 
earlier that success of the OCS candidate is not independent of whether the 
ASVAB waiver has been used for entry. These two variables are independent, 
however, when broken out by race. 

The following probabilities help interpret the dependence between A and B. 
The probabilities of success given race are estimated to be .78, .64, .70, .74 for 
Caucasians, Blacks, Hispanics and Others, respectively. (The empirical rates and 
the modeled rates are the same to two decimal places.) The proportions of 
candidates in each race which possess a waiver are .07, .32, .18, .10, and the 
proportions of candidates who don't possess a waiver in each race are the 
complementary values, .93, .68, .82, .90. The greatest proportion of candidates 
who don't possess a waiver are Caucasians (93%), with a good chance of success 
(78%). However, candidates that do utilize the waiver are divided primarily 
between Blacks (32%) and Hispanics (18%). Because the probability of success for 
these two races differ (67%) and (70%) respectively, we see that the overall 
probability of success with a waiver is lower than without a waiver. Also, the 
four success rates decrease monotonically as the four waiver use rates increase. 
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IV. Temporal Analysis 

The above analysis responds to the question posed in the introduction. But it 
is also of interest to consider the other factor, C, the fiscal year. If including the 
variable race sheds light on the dependence between having a waiver and 
success of the OCS candidate, perhaps considering this fourth variable will add 
to an understanding of this data set. 

Perhaps the most direct way to proceed is to consider the most general four 
factor model that reflects independence of factors A and B. In the notation 
established this would be [ACD] [BCD]. All interactions involving A and B are 
zero. Doing so produces a likelihood ratio p-value of .049. This is rather small for 
our tastes. Study of the residuals reveals two outlier cells: unsuccessful Hispanics 
with a waiver in FY89 and unsuccessful Caucasians with a waiver in FY92. These 
two cells belong to the same cases that exhibited low p-values in Table 3. 

It appears that the loglinear modeling system must provide for some AB 
interactive terms. Accordingly we apply the strategy which fits the models with 
all three way and lower order terms; all two way and lower order terms; and all 
one way terms. Then the overall model with the fewest terms and an acceptable 
overall fit is used as a starting point for further deletion of terms within the 
chosen set. The first model fit was the one with all three way interactions. This 
gives an overall fit with a p-value of .0387. However, as terms are deleted the 
p-value increases and the model [ABC] [BCD] [ACD] gives a slightly higher 
p-value for overall fit of .0657. Further deletion of terms leads to the model [ABC] 
[BCD] [AD] with p-value .22. 

The fact that the deletion of additional terms appears to improve the fit can be 
explained by noting the increase in the degrees of freedom. For the model with 
all three way interaction terms, the likelihood ratio test statistic is 21.95 with 12 
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degrees of freedom, deleting the ABD term increases degrees of freedom to 15 
and the test statistic to 24.01 and the deletion of the ABD term increases the 
degrees of freedom to 19 and the test statistic to 29.548. Therefore deleting terms 
does not increase the test statistic very much compared to the gain in degrees of 
freedom. 

Deleting either the ABC or BCD terms from the [AD] [ABC] [BCD] model 
results in models with much lower p-values for overall goodness of fit and 
standardized residuals that are of much larger magnitude than those of the [AD] 
[ABC] [BCD] model. Since the standardized residuals for this model range 
between -1.78 to 1.81, this model appears to give an adequate fit. In passing, we 
note that all AB interactive terms are modest in size. 

The estimated probabilities of success given race, waiver status and fiscal year 
{Pi\jki] are pl° tte d against year (k) in Figures 1 and 2. There is a general decrease 
in the probability of success over time in all four racial groups regardless of 
waiver status. In fact, when the model [AD] [BD] is fit to years separately, only 
1992 fails to fit with a p-value = .01. It appears that for the first four years this 
trend is reasonably well modeled as independent of waiver status. The presence 
of the ABC interaction term in the temporal model is a consequence of changes in 
1992, specifically the outlier cell cited earlier. 

The presence of the BCD interaction term can be explained by changes in the 
number of waivers utilized over time. To examine this, we fit a logistic regression 
model where the response variable is one or zero according to whether an 
individual received a waiver or not, and the explanatory variables are years and 
race. Since years is in fact an ordinal variable, it was scored as the integers 1 to 5 
for the years 1988 to 1992. This saves degrees of freedom and helps detect 
monotonic trends. 
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Figure 1 



A 

P [ S I waiver, race, year] vs Year 




Caucasian 

Black 

Hispanic 

Other 



Figure 2 



A 

P [S I no waiver, race, year] vs Year 




Caucasian 

Black 

Hispanic 

Other 
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The model with a cubic term in years gives an adequate fit to the data 
(p-value = .112). This model fits the data somewhat better than the model that fits 
the year as a categorical variable. 

The fitted values are the estimates of the conditional probabilities that an 
officer receives a waiver given year and race. These are plotted by race in 
Figure 3. From this plot it can be seen that except for 1989 there has been a 
general decline in the proportion of waivers awarded for each race. 



Figure 3 



A 

P [waiver I race, year] vs Year 




Caucasian 

Black 

Hispanic 

Other 



In conclusion, we have accounted for the nature of the paradox stated in the 
introduction by the use of loglinear analysis after collapsing the data over time. 
The odds ratio analysis served to support the independence vs. waiver 
hypothesis at a micro-level, and deeper loglinear modeling can be used to 
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quantify the changes in probabilities as functions of race and time. The final 
analysis collapses the data over OCS success or failure and treats the use of the 
waiver. It appears to be diminishing in time but there are some rather prominent 
separations by race. Some additional study in these areas can be found in [3]. 
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APPENDIX A 

Algorithm to produce p-values for the hypergeometric distribution. 



Let us view our basic 2x2 table as 



a b S 

c d F 



n\ n 2 N 



In the context of the report, a is the number of successful candidates among the 
n\ that used waivers; b is the number of successful candidates among the n 2 that 
did not use waivers, etc. The probabilistic structure used is a conditional one. 



which is a hypergeometric probability function. For the present purposes it is 
useful to describe the variable range constraints rather elaborately: 



Let us analyze the computations. Let P o be the value of (A.l) for the observed 
table. The p- value is the sum of all probabilities (A.l) which are less than or equal 
to Pq. Let 

C = n\l n 2 \ SI F!/N! 

Then (A.l) can be expressed as 



In the p-value computation the value of C is fixed and only the other factor in 
(A. 2) changes as the summation takes place. It is often wise to use logarithms in 




P(a\a + b- Q ' [ ~ ^ ^ ^ — 




(A.l) 



max(0, S-n 2 ) £ a < min(S, n\) 
max(0, S-«i) < b < min(S, n 2 ) 
max(0, F-n 2 ) <c< min(F, n\ ) 
max(0, F-n\) <d< min(F, n 2 ) 
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the computation because the factorials can get quite large. Also the two-sided 
p-value computation is managed by identifying the two tails of the distribution 
and summing their contributions. 

Our approach is to first identify the variable (a, b, c, d) that has the shortest 
range in the specific situation. To do this we compute the empirical odds ratio 




and determine the case 6< 1 or d>\. This identifies the tail that contains the 
experimental result. That is, we view the testing problem as Hq: pi = p2 vs 
Hi: pi *■ p 2 - The two estimators are 

Pl=a /«1 and p2=fc/«2- 

It is easily seen that pi <p 2 is equivalent to 6< 1; and the opposite case with 
0 > 1. Thus if 6 < 1 we choose M = min (a, d ) and sum the hypergeometric terms 
for that tail of the distribution. Of course, if 9 > 1 we choose M = min(b, c) for the 
single tail sum. If M = 0 in either case then Po is the total probability for that tail. 
To illustrate, we have 

P 0 = C/a\ d\ b\ d 

and for 6 < 1 we form the successive terms 



P_ 1P _P ^ P _ P (o-l)(rf-l) „ (a + \-M)(d + \-M) 

Ko-1,R 1 -R 0(6 + 1 )( c + i) ,R 2 -R 1 — — - (t + M)(c + M) 



(A.3) 



M 

and the single tail probability is Pq ^ R,- . 

i=0 



On the other hand, if 9 > 1 the R’s are formed differently. That is 

p _ i p _ r> be n _ n (fc-l)(c-l) p _ D (b + 1 - M)(c + 1 - M) 

^ ' 1 ^(p + l)(d + l)' 2 R, (p + 2)(d + 2) {a + M){d+M) 

(A.4) 
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To manage the opposite tail let us redefine the R's in the following way. For 
the case 6 < 1 we change to M = min(b, c ) and choose 



be „ „ (fc-l)(c-1) (i> + l-M)(c + l-M) 

Rl (a + l)(rf + l)' R2 R, (a + 2)(rf + 2) M M_1 (a + M)(i( + M) A ' 5) 



which matches (A.4) except that Ro = 1 is not in the set. The opposite tail 
probability is obtained by summing 

PoS* for all R,- < 1. (A. 6) 

The opposite tail for the case 0>1 is managed similarly. This time 
M = min(fl, d ) and define a new set of R's according to the form of (A.3), but 
omitting Rq = 1. Then apply the formula (A. 6). 
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APPENDIX B 



The estimated coefficients, their standard errors and p-values for the model 
[AD] [ABC] [BCD] are given in Table Bl. The coefficients are constrained so that 
one level of each factor has a coefficient that is set to zero. For example, for factor 
A there is only one estimated coefficient if corresponding to success at OCS; the 
coefficient corresponding to failure in OCS if is set to zero. Thus, the estimated 
value .9438 is a contrast and the t-value 19.45 tests the null hypothesis that the 
main effects for levels 1 and 2 of factor A are the same. Since A has only 2 levels 



this is equivalent to HQ-.Xf = 


Xf — 0. The main effects in Table Bl are labeled as 


follows: 








A 


Af 


(Success in OCS) 




B 


A? 


(Waiver) 




Cl 


Af 


(FY89) 




C2 


aS 


(FY90) 




C3 


A4 


(FY91) 




C4 


A5 


(FY92) 




D1 


A? 


(Hispanic) 




D2 


A? 


(Other) 




D3 


A? 


(Caucasian) 




All other main effects are set to zero. 


Interaction terms are similarly treated. 




Table Bl 






Value 


Std. Error 


t-value 


(Intercept) 


2.855 


0.147 


19.45 


A 


0.944 


0.102 


9.24 


D1 


-0.135 


0.198 


-0.68 


D2 


0.481 


0.180 


2.67 


D3 


2.640 


0.144 


18.35 


B 


-1.122 


0.297 


-3.77 


Cl 


0.151 


0.183 


0.83 


C2 


0.910 


0.165 


5.50 


C3 


0.820 


0.173 


4.73 


C4 


1.087 


0.163 


6.68 
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A:D1 


0.225 


0.116 


1.94 


A:D2 


0.304 


0.121 


2.51 


A:D3 


0.576 


0.085 


6.80 


A:B 


-0.085 


0.199 


-0.43 


A:C1 


0.036 


0.085 


0.42 


A:C2 


-0.278 


0.083 


-3.36 


A:C3 


-0.638 


0.083 


-7.70 


A:C4 


-0.438 


0.080 


-5.48 


B:C1 


0.888 


0.364 


2.44 


B:C2 


0.182 


0.358 


0.51 


B:C3 


0.300 


0.365 


0.82 


B:C4 


0.633 


0.342 


1.85 


B:D1 


-0.264 


0.389 


-0.68 


B:D2 


-1.084 


0.392 


-2.76 


B:D3 


-1.217 


0.280 


-4.35 


C:D1 


0.288 


0.235 


1.23 


C:D2 


-0.025 


0.211 


-0.12 


C:D3 


0.133 


0.176 


0.76 


C:D4 


-0.101 


0.220 


-0.46 


C:D5 


-0.546 


0.197 


- 2.77 


C:D6 


-0.513 


0.160 


-3.22 


C:D7 


0.220 


0.227 


0.97 


C:D8 


-1.057 


0.226 


-4.68 


C:D9 


-0.269 


0.169 


-1.59 


C:D10 


0.119 


0.214 


0.56 


C:D11 


-1.080 


0.206 


-5.25 


C:D12 


-0.422 


0.158 


-2.68 


A:B:C1 


-0.083 


0.252 


-0.33 


A:B:C2 


-0.031 


0.254 


-0.12 


A:B:C3 


0.210 


0.266 


0.79 


A:B:C4 


-0.306 


0.249 


-1.23 


B:C:D1 


-0.865 


0.487 


-1.78 


B:C:D2 


-0.075 


0.472 


-0.16 


B:C:D3 


-0.752 


0.493 


-1.52 


B:C:D4 


-0.648 


0.462 


-1.40 


B:C:D5 


-0.248 


0.480 


-0.52 


B:C:D6 


-0.245 


0.512 


-0.48 


B:C:D7 


-0.710 


0.635 


-1.12 


B:C:D8 


-1.308 


0.661 


-1.98 


B:C:D9 


-0.792 


0.343 


-2.31 


B:C:D10 


-0.220 


0.341 


-0.65 


B:C:D11 


-0.740 


0.351 


-2.11 


B:C:D12 


-0.806 


0.332 


-2.43 
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Table B2 contains the fitted cell means along with the standardized residuals. 
The standardized residuals are plotted against the fitted values in Figure Bl. 

Table B2 





count 






Fitted 


Std. 










Values 


Residuals 


1 


100 


FY88 


Cauc. 


98.540 


0.147 


2 


11 




Black 


13.347 


-0.663 


3 


10 




Hisp. 


11.208 


-0.368 


4 


12 




Other 


9.905 


0.644 


5 


142 


FY89 


Cauc. 


37.662 


0.368 


6 


37 




Black 


36.016 


0.163 


7 


12 




Hisp. 


16.980 


-1.276 


8 


20 




Other 


20.342 


-0.076 


9 


102 


FY90 


Cauc. 


103.464 


-0.144 


10 


30 




Black 


29.175 


0.152 


11 


20 




Hisp. 


20.539 


-0.119 


12 


11 




Other 


9.822 


0.369 


13 


77 


FY91 


Cauc. 


71.784 


0.608 


14 


22 




Black 


26.670 


-0.933 


15 


14 




Hisp. 


13.166 


0.227 


16 


2 




Other 


3.380 


-0.813 


17 


70 


FY92 


Cauc. 


76.628 


-0.768 


18 


36 




Black 


35.432 


0.095 


19 


22 




Hisp. 


17.527 


1.027 


20 


4 




Other 


2.414 


0.932 


21 


1113 


FY88 


Cauc. 


1112.644 


0.011 


22 


48 




Black 


44.632 


0.498 


23 


48 




Hisp. 


48.823 


-0.118 


24 


95 




Other 


97.901 


-0.295 


25 


1533 


FY89 


Cauc. 


1532.608 


0.010 


26 


56 




Black 


53.801 


0.298 


27 


80 




Hisp. 


78.468 


0.172 


28 


111 




Other 


115.123 


-0.387 


29 


1263 


FY90 


Cauc. 


1251.554 


0.323 


30 


77 




Black 


83.893 


-0.763 


31 


76 




Hisp. 


82.953 


-0.774 


32 


109 




Other 


106.601 


0.232 


33 


1013 


FY91 


Cauc. 


1020.580 


-0.238 


34 


58 




Black 


53.558 


0.599 


35 


78 




Hisp. 


73.037 


0.574 


36 


39 




Other 


40.826 


-0.288 


37 


1390 


FY92 


Cauc. 


1397.537 


-0.202 



22 



38 


87 




Black 


85.477 


0.164 


39 


108 




Hisp. 


105.300 


0.262 


40 


67 




Other 


63.687 


0.412 


41 


22 


FY88 


Cauc. 


23.460 


-0.305 


42 


8 




Black 


5.653 


0.928 


43 


5 




Hisp. 


3.792 


0.591 


44 


1 




Other 


3.095 


-1.389 


45 


30 


FY89 


Cauc. 


34.338 


-0.757 


46 


15 




Black 


15.984 


-0.249 


47 


11 




Hisp. 


6.020 


1.817 


48 


7 




Other 


6.658 


0.131 


49 


35 


FY90 


Cauc. 


33.536 


0.251 


50 


16 




Black 


16.825 


-0.203 


51 


10 




Hisp. 


9.461 


0.174 


52 


3 




Other 


4.178 


-0.607 


53 


21 


FY91 


Cauc. 


26.216 


-1.056 


54 


22 




Black 


17.330 


1.076 


55 


6 




Hisp. 


6.834 


-0.326 


56 


3 




Other 


1.620 


0.968 


57 


45 


FY92 


Cauc. 


38.372 


1.041 


58 


31 




Black 


31.568 


-0.101 


59 


8 




Hisp. 


12.473 


-1.357 


60 


0 




Other 


1.586 


-1.781 


61 


243 


FY88 


Cauc. 


243.356 


-0.023 


62 


14 




Black 


17.368 


-0.837 


63 


16 




Hisp. 


15.177 


0.209 


64 


31 




Other 


28.099 


0.538 


65 


323 


FY89 


Cauc. 


323.392 


-0.022 


66 


18 




Black 


20.199 


-0.499 


67 


22 




Hisp. 


23.532 


-0.319 


68 


36 




Other 


31.877 


0.715 


69 


350 


FY90 


Cauc. 


361.446 


-0.605 


70 


50 




Black 


43.107 


1.024 


71 


41 




Hisp. 


34.047 


1.154 


72 


38 




Other 


40.399 


-0.381 


73 


430 


FY91 


Cauc. 


422.420 


0.368 


74 


35 




Black 


39.442 


-0.721 


75 


38 




Hisp. 


42.963 


-0.773 


76 


24 




Other 


22.174 


0.383 


77 


481 


FY92 


Cauc. 


473.463 


0.345 


78 


50 




Black 


51.523 


-0.213 


79 


48 




Hisp. 


50.700 


-0.383 


80 


25 




Other 


28.313 


-0.635 
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Standardized Residuals 



Figure B1 




Fitted Values 
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APPENDIX C 

Short analysis of the model [ACD] [BCD]. 

This model features the conditional independence of factors A and B given 
the levels of C and D, coupled with a fully saturated modeling of the joint 
distribution of C and D. Thus the loglinear representation can be made more 
succinct than the direct representation. Since 

Pij\kt = Pi\kt Pj\kt' 

the maximum likelihood estimates of the two factors on the right hand side are 

Hi±kL and Him. 

n ++kt n ++k( 

respectively. It follows that for each k, t pair, the loglinear model of the left hand 
side may be expressed as 

= const + lf\ kt + 

and estimates of these parameters can be obtained rather easily from the twenty 
2x2 tables that lie behind Table 3. The maximum likelihood estimators of m,y \ia 
are 

n i+k( n + jkt / n ++kl 

and match the expected frequencies in the 2x2 contingency table computations. 
Next, the model calls for the saturated version of p k i, so that 

In m ++kt = P + + $ + ^k? 

with the customary constraints. The maximum likelihood estimators are 

™++kt = n ++kt 

and it follows from the rules of conditional and marginal expectation 

m ijkt = m kt Pij\kt 

lead to the estimates 

mijkt = m i+kt ™+jkl / m + +kt- 
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This point is especially convenient in that it allows chi squared test statistics 
for the model [ACD] [BCD] to be constructed merely by summing the individual 
chi squared statistics computed from the original twenty contingency tables. The 
degrees of freedom for this sum are the total of the individual table degrees of 
freedom. 
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APPENDIX D 



This appendix contains the details of a logistic regression model that treats 
the response variable of whether an individual possesses a waiver or not and 
using explanatory variables of years and race. Following the notation established 
in the paper, let 



P kl= ^iL 






for k = 
for l = 



be the probability that an individual of race l in year k possesses a waiver. 
Because years is an ordinal variable, it is treated as numeric with FY88, ..., FY92 
scored as 1, ..., 5 respectively. The logistic regression models fit to ln(Pjt*/(l-Pjtf)) 
along with the likelihood ratio test statistic G 2 and the corresponding p-values 
are as follows: 





Model 


G 2 


degrees 

of 

freedom 


p-value 


1. 


fj. + X c k + X^ 


25.11 


15 


.048 


2. 


H+tfk+tfk 2 + $ 


23.43 


14 


.054 


3. 


H + X^k + X^+X^+X? 


19.36 


13 


.112 


4. 


H+tfl k + X$k 2 + X^k 3 + X^k 4 + Xf 


18.95 


12 


.090 



The fits of models number 1 and 2 are inadequate. This is confirmed in Figures 
D1 and D2 where the standardized residuals are plotted against years. The 
pattern of the residuals in both figures suggests that higher order polynomials in 
k need to be fit to the data. The model 3 fit is acceptable and the residuals (Figure 
D3) appear to be evenly scattered when plotted against years. The hypothesis test 
between models 3 and 4 has likelihood ratio test statistic 19.36-18.95 with 1 
degree of freedom and p-value .52. Note that model 4 is equivalent to fitting the 
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logistic regression model where both race and years are treated as categorical 
variables. 

The standardized residuals and fitted values given in Table D1 are plotted in 
Figure D4. 

Table D1 



Race 


Year 


Fitted Pki 


Standardized 

Residuals 


Cauc. 


FY88 


0.0775 


0.7799 


Cauc. 


FY89 


0.0876 


-0.4536 


Cauc. 


FY90 


0.0760 


0.3658 


Cauc. 


FY91 


0.0627 


0.1402 


Cauc. 


FY92 


0.0618 


- 0.7288 


Black 


FY88 


0.3361 


-1.9954 


Black 


FY89 


0.3665 


1.0672 


Black 


FY90 


0.3311 


-1.8583 


Black 


FY91 


0.2873 


0.8670 


Black 


FY92 


0.2841 


1.3853 


Hisp. 


FY88 


0.1882 


0.0378 


Hisp. 


FY89 


0.2094 


-0.7101 


Hisp. 


FY90 


0.1848 


0.5946 


Hisp. 


FY91 


0.1558 


-0.2835 


Hisp. 


FY92 


0.1537 


0.2836 


Other 


FY88 


0.1010 


-0.2953 


Other 


FY89 


0.1138 


1.6707 


Other 


FY90 


0.0990 


-0.5201 


Other 


FY91 


0.0821 


-0.2613 


Other 


FY92 


0.0809 


-1.5438 



The coefficients $ £ = 1, 5 corresponding to the factor race are over- 
parameterized without an additional constraint. Familiar constraints are the 
"sum to zero" and the "set to zero" constraints. Statistical packages usually use 
either one of these constraints. S-PLUS, the package used for the analysis 
presented in this paper uses neither of these constraints. Instead, let 
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xj = {l,k,k 2 ,k 3 , 0,0,3) 
x 2 = (l,fc,fc 2 ,fc 3 ,-l,-l,-l) 

*3 = {l,k f k 2 ,k 3 t l,-l,-l ) j 
*4 =( l,k,k 2 ,k 3 ,0,2,-l). 

A 

Then for example in model 3, the fitted values Py can be found by 











A IT» ✓ A A A \ 

where /? =w\>^2 (along with estimated standard deviations and 
t-values) are given in Table D2. 



Table D2 



i 


Pi 




std error 


t-values 


1 


(Intercept) 


-2.3634 


.3800 


-6.219 


2 


(A|) 


1.0065 


.4822 


2.0687 


3 


(A c 2 ) 


-.3842 


.1791 


-2.145 


4 


U$) 


.0399 


.0198 


2.012 


5 




-.3906 


.0647 


-6.034 


6 




-.3717 


.0496 


-7.501 


7 




-.2583 


.0186 


-13.9202 



S-PLUS uses helmert polynomials to generate the linear combinations of the 
parameters used for each level of a categorical factor. 
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Figure D1 



Standardized Residuals vs. Years 



for the model In 



hi 






y-ht. 



= fj. + Aj” k + A^ 
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Figure D2 



Standardized Residuals vs. Years 

^ Pkt 



for the model In 






— fi + X^ k + X^ k ^ + Xf 
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Figure D3 

Standardized Residuals vs. Years 



for the model In 



Y^-l =fi + Af*: + X^k 2 + X%k 3 + A? 
. ^ ht J 
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Figure D4 

Standardized Residuals vs. Fitted Probabilities 
f - \ 



for the model In 
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= li + tfk+£k 2 +X%k 3 +X? 
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